Classification of German Newspaper Comments
نویسندگان
چکیده
Online news has gradually become an inherent part of many people’s every day life, with the media enabling a social and interactive consumption of news as well. Readers openly express their perspectives and emotions for a current event by commenting news articles. They also form online communities and interact with each other by replying to other users’ comments. Due to their active and significant role in the diffusion of information, automatically gaining insights of these comments’ content is an interesting task. We are especially interested in finding systematic differences among the user comments from different newspapers. To this end, we propose the following classification task: Given a news comment thread of a particular article, identify the newspaper it comes from. Our corpus consists of six well-known German newspapers and their comments. We propose two experimental settings using SVM classifiers build on commentand article-based features. We achieve precision of up to 90% for individual newspapers.
منابع مشابه
Treebank Profiling of Spoken and Written German
This paper profiles significant differences in syntactic distribution and differences in word class frequencies for two treebanks of spoken and written German: the TüBa-D/S, a treebank of transliterated spontaneous dialogs, and the TüBa-D/Z treebank of newspaper articles published in the German daily newspaper ’die tageszeitung’ (taz). The approach can be used more generally as a means of disti...
متن کاملWhat Linguists Always Wanted to Know about German and Did not Know How to Estimate
This paper profiles significant differences in syntactic distribution and differences in word class frequencies for two treebanks of spoken and written German: the TüBa-D/S, a treebank of transliterated spontaneous dialogues, and the TüBa-D/Z treebank of newspaper articles published in the German daily newspaper ‘die tageszeitung’ (taz). The approach can be used more generally as a means of dis...
متن کاملClassifying Number Expressions in German Corpora
Number and date expressions are essential information items in corpora and therefore play a major role in various text mining applications. However, so far number expressions were investigated in a rather superficial manner. In this paper we introduce a comprehensive number classification and present promising, initial results of a classification experiment using various Machine Learning algori...
متن کاملAn XML-based Tool for Tracking English Inclusions in German Text
The use of lexicons and corpora advances both linguistic research and performances of current natural language processing (NLP) systems. We present a tool that exploits such resources, specifically English and German lexical databases and the World Wide Web to recognise English inclusions in German newspaper articles. The output of the tool can assist lexical resource developers in monitoring c...
متن کاملSelf Embedded Relative Clauses in a Corpus of German Newspaper Texts
The distribution of center self-embeddings and extrapositions in German is assumed to reflect a universal performance strategy of minimizing memory load during parsing. Self-embedded relative clauses of embedding depth 2 were semi-automatically analysed in a treebank of German newspaper texts. Clause length and especially extraposition distance are found as the main distinctive parameters betwe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016